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Abstract 


RNA interference is used for SARS-related pharmaceutical research and development. 
Following bioinformatic method twenty seven 21-25 base-long sequence segments in SARS-CoV 
genome are predicted as the optimal target sites of small interfering RNA duplexes. 


SARS(severe acute respiratory syndrome) was first identified in Guangdong Province,China 
and rapidly spread to many regions in China and around the world. It caused death and disaster 
to thousands of human beings. However, the active drug in treating SARS has not been found 
yet. The genome sequences determined by several groups[l]-[3] show that it is a variant of 
coronaviruses, belonging to single-stranded plus sense RNA viruses. The genome is about 30 kb in 
length, and its several encoded proteins have been separated and purified. This provides a sound 
basis for SARS related pharmaceutical research and development . The use of double-stranded 
RNA (dsRNA) to manipulate gene expression (RNA interference or RNAi) has been proved 
highly effective, at least 10 times more effective than either using sense or antisense RNAs 
alone[4]. The RNAi triggered by dsRNA is a phenomenon of homology-dependent gene 
silencing[5][6][7]. It was found that the small interfering RNA (siRNA, 21-25 nt long) plays an 
important role in RNAi-related gene silencing pathways[8]. Progress has also been made in 
anti-HIV and anti-HCV drug design by applying the method of RNA interference[9]-[10]. To 
design anti-SARS-CoV drug, one strategy is to search for siRNAs which specifically interfere the 
gene expression and block the genome replication of SARS-associated coronavirus. In this note 
we shall make theoretical prediction on the possible target sites of siRNAs in the virus genome. 
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Upon infection of an appropriate host cell, the viral envelope is fused with cell membrane 


and the viral plus sense RNA enters the host cell. Then the 5’ most ORF of the viral genome is 
translated into several nonstructural proteins including an RNA-dependent RNA polymerase and 
an ATPase helicase. These proteins in turn are responsible for replicating the viral genome as well 
as generating nested transcripts that are used in the synthesis of the viral proteins. The 
transcriptionally active, subgenomic-size minus strands are also discovered [2]. The siRNA- 
mediated RNA interference has strong specificity, and may play certain roles in affecting the 
process of virus expression and proliferation. 

RNA secondary structure is composed of double-stranded region of stacked base pairs 
(stem) and single-stranded region (loop). RNA structure predictions comprise base-paired and 
non-base-paired regions in various types of loop and junction arrangements (including hairpin 
loop, bulge loop, interior loop and junctions or multi-loops[ll]). Only 21-25 nt long (or more) 
non-base-paired regions can be served as the target sites of siRNA. They are called free 
segments. The long non-base-paired region containing one or several short stems (total length of 
stems 1-3 base pairs) is also considered in our statistics. The latter is called quasi-free segments. 

By using program RNAstructure (version 3.7)[12] we folded the RNA sequence of viral 
genome in a window of 3000 nucleotides, and shifted the window 1500 nucleotides each step 
along the sequence, so that each site in virus genome has participated in different folds more than 
10 times. We selected 21-25 base-long free and quasi-free segments as the candidates for target 
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sites of RNA interference when these segments frequently occurred in non-base-paired regions 


based on the above calculation. A given RNA sequence segment may have different configurations 
of secondary structure with lower free energy, some containing short stems (quasi-free) but some 
not (free). The total frequency of a segment occurring in non-base-paired region of different 
folds is called appearance rate. If each quasi-free case is multiplied by a reduced factor in 
numeration, namely, by 0.9 for 1 base pair, 0.8 for 2 base pair, and 0.7 for 3 base pairs (base pairs 
may be continuous in structure or disconnected) then the total number of folds is called reduced 
appearance rate. 

The antisense oligonucleotide (AO) complementary to a specific sub-sequence of an RNA 
target has been extensively investigated. AO efficacy is affected by many factors. Apart from the 
binding energy between AO and RNA, which describes the AO accessibility to the RNA, the 
sequence motif is another important factor. The correlation of 9 sequence motifs with AO efficacy 
was deduced empirically in [13] [14]. If the target sequence contains CCAC, TCCC, ACTC, 
GCCA and CTCT, then it will make a positive score. If the target sequence contains GGGG, 
ACTG TAA and AAA, then it will make a negative score. 

On the other hand, experiment shows that 2 nt 3’ overhangs in siRNA duplex has played an 
important role in its stabilization [8]. That means AA in 5’ end of the sequence segment is 
favorable for its target. 

In SARS-CoV genome we have found several tens’ long segments (length >20 nt) matching 
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with those of human beings. To guarantee the safety of the designed drug, we make alignment of 


free and quasi-free segments of high appearance rate with human genome and delete the matching 
ones (more than 18 exactly matching bases) in siRNA target candidates. 

By the use of RNA sequence data of SARS-CoV, Isolate Tor2, twenty seven optimal 20-25 
base-long siRNA targets are selected from 60000 candidates in both strands. They are listed in 
Table 1 and 2 for minus-strand and plus-strand respectively. Each segment is scored. The main 
term of score is the value of reduced appearance rate (column 5 of Table 1 and 2). The sum of AO 
efficacies (multiplied by 10) in a segment is also listed for reference (column 6). The enhancing 
factor of AA occurred in 5’ end is indicated in column 7 by notation +. The results of multiple 
sequence alignment of 19 complete SARS coronavirus genome give the mutational sites between 
different strains [15]. The last term of score is related to mutational sites. Each point mutation in 
siRNA target sequence contributes -1 in score (column 8). Though the relative importance of 
these terms cannot be quantitatively estimated at present we expect that the main contribution to 
the score comes from the reduced appearance rate (column 5). 

Generally, in the proliferation of plus-sense RNA viruses the concentration of plus-strand is 
much higher than that of minus-strand. For example, they may differ by 100 times in TMV 
(tobacco mosaic virus) [16]. If the concentration of minus-strand in SARS-CoV is lower, then the 
RNA interference targeted at virus minus-strand will be more effective . We suggest that the 
latter point should be checked by experiments immediately since it is important for designing an 
effective siRNA duplex. 
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The above approach is of broad interest to other anti-virus drug design. 
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Table 1 siRNA target sequence in minus-strand SARS-CoV 


target sequence 5’-3' 

position 

length 

appear. 

reduced 

AO 

AA 

mut. 




rate 

appear. 

eff 

5’ 

site 





rate 

(xlO) 

end 

score 

AAUUUCUUGAAUUACCGCGACUAC 

1041-1064 

24 

8 

8 


+ 

-1 

AAUUGAUCUAAGAGUAAAAAAU 

4446-4467 

22 

9 

6.3 

-4.4 

+ 

-1 

UGU CAACACAAAGUAAUCACC 

12902-12922 

21 

7 

5.6 

-1.2 


0 

CUCCCUUCGAAUUGUUAUAGU 

17025-17045 

21 

13 

10.4 

2.6 


0 

AAGACAUCAAAAACAAAAGUG 

20292-20312 

21 

7 

4.9 

-5.0 

+ 

0 

ACACCAUCUAAAGCUACACCC 

21671-21691 

21 

11 

oo 

OO 

-1.2 


0 

AAUUAGAUAAGAGUACACCAA 

22788-22808 

21 

10 

10 


+ 

0 

UAGCAUCACGACCACACACAC 

24177-24197 

21 

12 

12 

1.7 


0 

CUAGUAUAAAAGAAGAAUCGG 

25280-25300 

21 

11 

10.6 

-2.2 


0 

AAUUUUAAUUCCUUUAUACUU 

25327-25347 

21 

10 

8.6 


+ 

0 

CCUUCAAUAACUAAAUUUUCA 

28430-28450 

21 

10 

10 

-1.4 


0 

AGCU ACACAGAUUUUAAAGUU 

29662-29682 

21 

8 

7.8 

-1.2 


0 





Table 2 siRNA target sequence in plus-strand SARS-CoV 


target sequence 5’-3' 

position 

length 

appear. 

reduced 

AO 

AA 

mut. 




rate 

appear. 

eff. 

5’ 

site 





rate 

(xlO) 

end 

score 

AAACAAUAAUAAAUUUUACUG 

128-148 

21 

8 

8 

-4.2 

+ 

0 

UUGUUUCUGUUACCUUCUCUU 

11107-11127 

21 

12 

12 

1.8 


0 

AAUCAUUAUUAAAGACUGUA? 

13921-13941 

21 

14 

9.8 

-3.0 

+ 

0 

UACCCAGAUCCAUCAAGAAUAUU 

15861-15883 

23 

10 

10 



0 

UAUCUCACCUUAUAAUUCACA 

17699-17719 

21 

9 

8.1 



0 

AAUUGCCUUUCUUUUACUAUU 

19291-19311 

21 

8 

7.2 


+ 

0 

GACUACAAAAGAGAAGCCCCA 

19809-19829 

21 

8 

6.4 

-2.0 


0 

AACCUUCUACCCAAAACUACAA 

20567-20588 

22 

11 

11 

-2.0 

+ 

0 

UUUUCUUAUUAUUUCUUACUC 

21499-21519 

21 

12 

11.8 

0.9 


0 

AUUAUUAACAAUUCUACUAAU 

21837-21857 

21 

13 

10.7 



0 

AAACAACUUAGCUCUAAUUUU 

24327-24347 

21 

13 

9.1 

-3.0 

+ 

0 

AUU AACAACACAGUUUAUGAU 

24834-24854 

21 

12 

10.8 



0 

AAUAUGAGCAAUAUAUUAAAU 

25051-25071 

21 

8 

6.4 

-1.2 

+ 

0 

AACGAACUAACUAUUAUUAUU 

26347-26367 

21 

9 

9 


+ 

0 

AACGAACAUGAAAAUUAUUCUCUU 

27266-27289 

24 

10 

10 


+ 

0 
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